Multi-Modal Tracking of Faces for Video Communications
نویسندگان
چکیده
visual processes in cyclic manner. This paper describes a system which uses multiple visual processes to detect and track faces for video compression and transmission. The system is based on an architecture in which a supervisor selects and activates visual processes in cyclic manner. Control of visual processes is made possible by a confidence factor which accompanies each observation. Fusion of results into a unified estimation for tracking is made possible by estimating a covariance matrix with each observation. This system demonstrates that robust operation can be achieved by coordinating multiple visual processes. Control of individual processes is made possible by the inclusion of a confidence factor accompanying each observation. Fusion of the results is made possible by the determination of an error estimate (a covariance matrix) for each observation. Composing a system from a redundant ensemble of processes permits the overall system to automatically adapt to a variety of operational circumstances. Visual processes for face tracking are described using blink detection, normalised color histogram matching, and cross correlation (SSD and NCC). Ensembles of visual processes are organised into processing states so as to provide robust tracking. Transition between states is determined by events detected by processes. The result of face detection is fed into recursive estimator (Kalman filter). The output from the estimator drives a PD controller for a pan/tilt/zoom camera. The resulting system provides robust and precise tracking which operates continuously at approximately 20 images per second on a 150 megahertz computer work-station. The following section reviews the visual process architecture used for the system and describes techniques for estimating a confidence factor and an error bounds for visual processes. A tracking process based on a zero-th order recursive estimator is described. Visual processes are described for the detection and tracking of faces using blink detection, color, and correlation, as well as processes for estimation and camera control. Visual processes are grouped into states with state transitions triggered by events. An example of an execution trace is provided with the system as configured at the time of writing.
منابع مشابه
Name-It: Naming and Detecting Faces in News Video
We have developed Name-It, a system that associates faces and names in news videos. The system is given news videos, which include image sequences and transcripts obtained from audio tracks or closed caption texts. The system can then either infer possible name candidates for a given face, or locate a face in news videos by name. To accomplish this task, the system takes a multi-modal video ana...
متن کاملName-It: Naming and Detecting Faces in News Videos
We have developed Name-It, a system that associates faces and names in news videos. The system is given news videos, which include image sequences and transcripts obtained from audio tracks or closed caption texts. The system can then either infer possible name candidates for a given face, or locate a face in news videos by name. To accomplish this task, the system takes a multi-modal video ana...
متن کاملHybridization of Facial Features and Use of Multi Modal Information for 3D Face Recognition
Despite of achieving good performance in controlled environment, the conventional 3D face recognition systems still encounter problems in handling the large variations in lighting conditions, facial expression and head pose The humans use the hybrid approach to recognize faces and therefore in this proposed method the human face recognition ability is incorporated by combining global and local ...
متن کاملRobustification of detection and tracking of faces
Although computing power and transmission bandwidth have both been steadily increasing over the last few years, bandwidth rather than processing power remains the primary bottleneck for many complex multimedia applications involving communication. Current video coding algorithms use, for instance, intelligent encoding to yield higher compression ratios at the cost of additional computing requir...
متن کاملRobust Tracking and Compression for Video Communication
Principal components analysis has been studied by the computer vision community as a source of features for recognition of faces, objects and scenes [1, 2]. The use of the dominant principal components as ”holistic” features for recognition has provided new insights into view invariant and illumination invariant recognition. Unfortunately, applications in object recognition generally require pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997